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ABSTRACT  ' ] 

! y 

At  the  present  time,  data  communication  networks  | 

transmit  much  information  besides  the  actual  user's  messages. 

This  "extra"  information  is  called  protocol  information.  ! | 

This  thesis  extends  Gallager's  initial  work  in  providing  an  i] 

i 

information-theoretic  lower  bound  to  how  much  of  this  protocol  \ 

information  is  absolutely  necessary  for  the  proper  operation 

of  a network.  The  lower  bound  is  a function  of  the  average 

amount  of  time  messages  are  allowed  to  be  delayed  before  | i 

being  transmitted.  The  bound  suggests  that  the  strategies 

considered  by  Gallager  are  close  to  optimal. 
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Introduction  and  Basic  Conceots 
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In  the  past  decade,  much  of  the  work  in  communications 
has  dealt  with  various  problems  involved  with  the  imple- 
mentation of  data  communication  networks.  In  its  general 
form  a data  communication  network  may  be  considered  as  a 
finite  collection  of  nodes  interconnected  by  communication 
links.  Each  node  may  have  a finite  number  of  sources  and/or 
receivers  connected  to  it.  A source  produces  messages 
which  must  be  transmitted  through  the  network  to  a specified 
receiver. 

We  define  a message  as  a finite  sequence  of  binary 
digits.  The  type  of  source  which  is  of  greatest  practical 
interest  is  that  which  produces  messages  whose  length  are 
stochastic,  generally  short  compared  to  the  length  of  the 
idle  periods  between  messages,  and  whose  starting  times  are 
random.  We  will  call  such  a source  sporadic,  and  will  model 
the  message  starting  times  as  a Poisson  process.  (Fuchs  and 
Jackson,  1969) 

It  is  the  job  of  the  nodes  to  route  messages  through 
the  network.  There  are  many  options  available  as  to  exactly 
how  these  nodes  will  accomplish  their  task  and  much  has 
been  written  on  various  aspects  of  these  options. 

One  obvious  fact  is  that  more  information  must  be 


transferred  from  the  source  to  the  nodes  than  just  the  data 
to  be  communicated  to  the  receiver,  e.g.,  there  must  be 
information  that  tells  the  node  v/hat  is  the  destination  of 
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the  message.  This  thesis  extends  the  work  of  (Gallager, 

1976)  and  establishes  a lower  bound  on  how  much  of  this 
extra  information  must  be  sent.  We  refer  to  this  "extra 
information"  as  protocol  information.  The  term  "protocol" 
is  often  used  in  connection  with  any  control  information 
in  the  network,  but  we  will  reserve  this  term  for  that 
control  information  which  is  edisolutely  necessary  for  the 
netirark  to  transmit  messages  properly. 

There  are  many  problems  in  data  networks  whose  solutions 
generate  control  information  that  will  not  be  considered 
here.  He  now  mention  some  of  these  problems  and  show  that 
their  needed  information  is  separable  from  the  basic 
protocol  Information.  This  will  enable  us  to  define  more 
precisely  that  control  information  which  we  are  considering 
to  be  protocol  information. 

First  of  all,  we  assume  that  the  network  topology  is 
static,  that  is,  we  do  not  deal  with  the  problems  of  add- 
ing or  removing  sources,  receivers,  or  links.  Control 
information  normally  generated  to  handle  such  problems  can 
be  viewed  as  ordinary  messages  generated  by  special  sources 
in  the  network  and  need  not  concern  us  here.  Other  control 
Information  which  we  regard  in  the  same  manner  is  that 
necessary  to  Insure  user  privacy,  that  used  to  control  the 
flow  of  messages  into  the  network,  and  that  required  to 
deal  with  flexible  routing  strategies.  We  a].so  consider 
all  error  correcting  and  error  detecting  information  to  be 

$ 

1 ■ . 

— - Ji 


imbedded  within  the  message  itself;  the  major  problem  such 
error  control  procedures  present  to  the  network  is  that  of 
’the  vAriable  delay  for  processing  and  retransmission.  We 
choose  to  ignore  this  problem  and  other  problems  which  cause 
variable  delay.  We  therefore  assume  that  our  messages  are 
transmitted  across  the  network  with  some  fixed  delay. 

Now  that  we  have  mentioned  that  control  information 
which  is  not  of  interest  in  this  thesis,  we  next  consider 
the  types  of  control  information  which  do  come  under  our 
definition  of  protocol.  Any  information  which  the  receiver 
learns  simply  by  receiving  the  message  is  necessarily  sent 
whether  intentionally  or  not.  Thus  protocol  information 
includes  certain  eunounts  of  addressing  information  (where 
the  message  has  arrived),  starting  time  information,  and 
message  length  information.  In  the  next  chapter  we  shall 
look  at  particular  excunples  of  networks  and  Identify  issues 
surrounding  such  protocol  information. 


Chapter  II 

Examples  of  Networks 
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Since  we  are  trying  to  discover  the  amount  of  protocol 
required  by  every  network,  we  first  envision  the  simplest 
possible  network,  reasoning  that  this  network  will  require 
the  least  amount  of  protocol.  A very  simple  system  would 
be  that  in  which  eewh  source  is  allowed  to  communicate 
with  only  one  receiver  and  this  communication  takes  place 
over  a dedicated  communication  link  (or,  more  precisely, 
over  a fixed  fraction  of  the  capacity  of  each  link  on  the 
path  between  the  source  and  receiver . ) 

The  fact  that  each  source  can  communicate  with  only 
one  receiver  is  only  a conceptual  simplification.  If  a 
source  actually  wants  to  transmit  to  N receivers,  it  can 
conceptually  be  partitioned  into  N sources,  each  of  which 
transmits  only  to  one  receiver.  A receiver  that  x^ants  to 
receive  from  more  than  one  source  can  be  similarly  partitioned. 

We  shall  call  such  a system  a multiplex  system  because 

I 

of  its  assignment  of  fixed  fractions  of  each  link  capacity 
to  each  source-receiver  pair. 

At  first  glance,  it  would  appear  that  this  system 
uses  no  protocol  Information  at  all.  Indeed,  if  each 
source  were  to  produce  messages  which  were  always  of  the 
same  length  and  produces  these  messages  at  regular  intervals 
then  this  scheme  requires  no  protocol  information.  Of 
course,  this  contradicts  our  assumption  that  the  sources 
are  sporadic. 
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We  now  consider  the  multiplex  system  with  sporadic 
sources . The  problem  here  is  that  the  receiver  must  be 
able  to  tell  when  there  is  a message  on  the  link  and  when 
the  link  is  idle.  Otherwise,  it  might  interpret  "noise" 
on  the  channel  as  a spurious  message.  One  method  of 
accomplishing  this  is  for  the  transmitter  to  have  a special 
idle  character  or  "flag”  which  is  repeatedly  sent  when  it 
has  no  message  to  send  and  which  cannot  appear  at  the  begin- 
ning of  a message.  The  first  absence  of  the  idle  character 
signals  the  start  of  a message  and  the  next  appearance 
signals  the  end  of  a message.  Alternatively,  the  transmitter 
can  use  a special  character  or  flag  to  signal  explicitly 
the  beginning  of  a message.  Information  must  then  also  be 
provided  about  the  length  of  each  message  by  either  prefixing 
the  message  with  a header  or  by  providing  another  (possibly 
different)  flag  at  the  end  of  the  message.  All  such  "extra" 
information  is  protocol  information,  necessary  for  proper 
operation  of  the  network.  Indeed,  in  a multiplex  system, 
the  amount  of  protocol  information  for  each  source-receiver 
pair  is  determined  a priori  as  the  amount  of  channel  capac- 
ity assigned  to  the  pair  in  excess  of  the  source  rate.  If 
the  rate  of  the  source  is  known,  we  may  set  the  channel 
capacity  as  close  to  the  source  rate  as  we  like  but. 


since  the  source  is  sporadic,  the  result  is  a queueing 
delay  which  increases  as  the  channel  capacity  decreases. 
We  note  finally  that  the  explicit  protocol  information  in 


the  multiplex  system  consists  of  message  starting  time  and 
message  length  information.  No  specific  address  information 
is  required. 

A second  type  of  system,  more  commonly  used  than  multi- 
plex systems  in  data  communication  networks,  is  a message 
switching  system.  In  this  latter  type  of  system,  messages 
are  preceded  by  an  encoding  of  the  receiver's  address  and 
of  the  message  length.  These  "header"  digits  represent 
protocol  information.  Each  intermediate  node  looks  at  the 
protocol  bits  and  learns  from  this  header  where  it  must 
send  the  message  in  order  to  move  it  across  the  network. 

(How  the  node  decides  where  to  send  the  message  once  it  has 
the  receiver's  address  is  a large  and  interesting  problem 
unto  Itself.  This  problem,  called  the  routing  problem,  will 
not  concern  us  here . ) Note  that , to  do  its  job , the  node 
must  also  learn  the  message  length  so  that  it  will  know  how 
many  of  the  bits  following  the  header  should  be  sent  toward 
the  destination.  When  the  message  reaches  its  destination 
the  receiver  is  alerted  to  this  fact  by  recognizing  his  own 
address.  Thus,  for  the  receiver,  the  address  in  a switching 
system  performs  the  same  role  as  the  starting  flag  in  a 
multiplex  system. 

It  is  easy  to  see  that,  for  a network  with  a large  num- 
ber of  receivers,  the  information  contained  in  the  header 
will  be  much  larger  than  that  contained  in  a flag;  thus, 
from  a protocol  standpoint,  a multiplex  system  using  starting 


time  protocols  is  much  more  efficient  than  a message  switch- 
ing system  using  address  protocols.  The  problem  is  that, 
in  order  to  reduce  the  protocols  in  the  multiplex  system, 
we  must  incur  large  delays. 

We  now  explore  a third  type  of  system  which,  when  the 
number  of  sources  per  node  becomes  very  large  auid  the 
capacity  of  the  channel  increases  appropriately,  can  be 
made  to  exhibit  negligible  queueing  delay.  This  system 
has  an  added  advantage  in  that,  if  one  chooses  to  allow 
delay,  one  can  reduce  the  protocol  information.  This  system 
is  a variation  of  the  multiplex  system;  we  will  call  it  a 
statistical  multiplexing  system. 

Consider  a large  number  n of  sources  which  want  to 
communicate  over  a common  link.  They  can  share  the  capacity 
of  the  link  in  the  following  manner; 

Let  each  source  have  a queue  associated  with  it  and, 
in  addition,  let  there  be  a central  queue  at  the  node.  The 
transmitting  node  services  the  source  queues  cyclicly, 
servicing  each  queue  every  Tg  seconds.  Servicing  a source 
queue  corresponds  to  transferring  the  contents  of  the  source 
queue  to  the  central  queue.  (see  Fig.  1)  The  contents  of 
the  central  queue  are  then  transmitted  to  the  receiver  with 
protocol  information  added  to  tell  the  receiving  node  from 
which  source  the  message  is  coming  and  how  long  the  message  is. 

Assume  that  each  source  is  Poisson  and  on  the  average 
emits  a messages  per  second.  Given  any  service  time  T^,  if 


Figure  1;  A Statistical  Multiplexing  System 
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the  number  of  sources  n is  large  enough,  the  law  of  large 
numbers  implies  that  with  high  probability,  the  number  of 
messages  entering  the  central  queue  every  seconds 
divided  by  the  number  of  sources  is  very  close  to  ctTg.  In 
order  to  be  able  to  keep  pace,  the  channel  must  have  the 
capacity  only  a slight  percentage  in  excess  of  that  needed 
to  transmit  na  messages  and  their  associated  protocol 
information  every  second.  No  more  them  log2n  bits  per 
message  are  needed  to  specify  the  message  origin.  If  the 
message  length  is  a random  variable  M,  then  very  close  to 
H(M)  bits  per  message  will  be  necessary  to  specify  the 
message  lengths,  where  H(M)  is  the  entropy  of  M.  If  we 
assume  that  bits  in  a message  are  zero  or  one  with  equal 
probability,  then  very  close  to  E(M)  bits  per  message  will 
be  necessary  to  specify  the  messages  themselves.  We  have 
again  invoked  the  law  of  large  numbers  in  these  last  two 
statements.  We  can  therefore  set  the  capacity  to  be  only 
a slight  percentage  in  excess  of  na(E(M)  -i-  H(M)  t lo92n) 
bits  per  second.  Because  we  need  exactly  a(E(M)  + H(m)  + l092n) 
bits  per  second  per  source  with  probadaility  close  to  one  we 
don't  need  much  mar9in  in  order  to  9uarantee  that  with  hi9h 
probability  no  messa9e  stays  in  the  central  queue  more  than 

seconds. 

s 

For  the  statistical  multiplexing  system,  the  average 

■^s 

delay  per  message  in  the  central  queue  is  -5—  seconds.  The 
average  delay  per  message  in  each  source  queue  is  also  y seconds, 
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SO  the  total  average  delay  per  message  is  seconds.  As 

n increases  without  limit  we  can  thus  make  the  queuing 

delay  arbitrarily  small  if  we  so  desire  — but  we  shall 

see  that  we  also  have  the  option  of  increasing  the  delay 

in  order  to  reduce  the  necessary  protocol  information. 

If  Tg  is  small,  the  probability  of  finding  a message 

in  a source  queue  is  small  each  time  the  source  queue  is 

san^led.  Because  of  this,  the  messages  in  the  central  queue 

will  be  from  a conqpletely  random  selection  of  sources  and 

we  can  do  no  better  than  to  label  each  message  with  log2n 

bits  per  message  to  indicate  its  origin  (or,  equivalently, 

its  destination.)  If  we  allow  T to  Increase,  we  can 

s 

decrease  this  protocol  Information. 

If  T is  large,  there  will  with  high  probability  be  more 

B 

than  one  message  from  each  sovurce  entering  the  queue  during 
each  sampling  period.  The  transmitting  strategy  can  then 
be  the  following:  transmit  all  the  messages  in  the  queue 
from  source  1,  with  a header  telling  how  many  such  messages 
there  are  and  the  message  lengths;  then  do  the  same  for 
source  2,  and  continue  for  each  source  in  turn,  returning 

to  source  1 T seconds  later  after  sending  the  messages 

8 

from  the  last  source.  This  cycle  will  also  have  a period 
of  Tg  seconds  when  there  is  no  excess  capacity.  If  there 


is  excess  capacity,  idle  symbols  will  need  to  be  sent  to 

fill  out  the  T second  period,  but,  since  the  amount  of 

8 

excess  capacity  needed  is  only  a small  percentage,  these 


idle  syiTiljols  represent  a negligible  amount  of  protocol 
information  per  message.  The  protocol  information  in  this 
situation  is  the  message  length  information  which  we  can 
do  nothing  about,  and  the  n independent  random  varisdsles 
indicating  how  many  messages  from  each  source  arrived  in 
the  central  queue  in  a period  Tg.  These  random  variables 
are  Poisson-distributed  with  mean  cxT^,  so  the  message 
starting  time  protocol  per  message  is  nH(PQ,^  ) where 


H (Pax  ^ entropy  of  a Poisson  remdom  varicible  with 

s 

mean  oTg.  We  will  show  later  that,  for  fixed  a and  large 

T_,  H(Pq,j  )>1.41  + >5  ln(aT.)  nats  so  the  starting  time  proto- 
® s ® 

col  per  message  goes  to  zero  as  T_  goes  to  infinity.  In 

s 

fact,  this  protocol  decreases  monotonically  with  increasing 
Tg.  We  see  that  in  this  statistical  multiplexing  system 
there  is  a direct  tradeoff  possible  between  starting  time 
protocols  and  message  delay.  The  nature  of  this  tradeoff 
is  the  subject  of'  this  thesis.  In  the  sequel,  we  will 
develop  a lower  bound  on  the  amount  of  starting  time  proto- 
col information  necessary  to  operate  within  a certain 
constraint  on  the  average  message  delay.  This  bound  will 
hold  for  all  networks  with  the  proper  source  characteristics 
and  will  be  independent  of  the  particular  network  strategy 


used. 


Note  that  in  the  above  example,  despite  the  fact  that 
the  sources  are  sharing  the  central  queueing  facilities,  the 
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amount  of  protocol  Information  per  message  in  the  network 
is  equal  to  the  sum  of  the  protocol  per  message  of  each 
source  divided  by  the  number  of  sources,  i.e.,  to  the  aver- 
age protocol  information  per  message  for  a single  source. 
Thus,  if  we  assume  identical  statistics  for  each  source, 
then  we  need  analyze  only  one  source— receiver  pair  to 
establish  a lower  bound  on  protocol  information. 


w 


I 


Chapter  III 

Gallager's  Problem  and  Results* 
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In  this  section  we  present  the  original  problem  as 
formulated  by  Gallager  and  review  his  results. 

We  assume  that  each  source  in  the  network  emits  messages 
at  random  times.  The  message  starting  times  are  assumed  to 
form  a stationary  Poisson  process.  As  mentioned  in  the 
previous  section,  we  need  to  consider  only  one  source- 
receiver  pair.  Let  the  Poisson  process  for  this  source 
have  parameter  a;  that  is,  a is  the  expected  number  of  mes- 
sage emissions  per  second,  and  1/a  is  the  mean  interarrival 
time.  We  assvune  further  that  the  entire  message  arrives 
instantaneously  at  a processor  associated  with  the  source. 

This  processor  has  the  option  of  holding  the  message  for  2m 
unspecified  length  of  time  and  then  sends  it  along  the  net- 
work. This  processor  may  be  equivalently  thought  of  either 
as  part  of  the  source  or  as  part  of  the  first  node.  The 
message  is  then  transmitted  instantaneously  across  the  net- 
work to  the  receiver. 

The  messages  from  the  source  have  independent  lengths 
(M)  described  by  a probability  mass  function  Pj^(m).  Intro- 
ducing such  delay  does  not  change  the  necessary  amount  of 
protocol  about  messages  lengths . The  transmitted  protocol 

information  per  message  must  be  at  least  H (M)  = -?  Pj^(m)  logPj^(m)  . 

m-1 

*All  references  to  Gallager  refer  to  (Gallager,  1976) 
unless  otherwise  noted. 


i 

I 


I 
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If  we  further  assume  that  the  message  lengths  are  distribu- 
ted geometrically  with  paramenter  e (i.e.,  the  average 
message  is  1/e),  this  expression  reduces  to  H(M)  = l/eH(e) 
where  H(x)  » -xlogx  - (1-x) log (1-x)  the  binary  entropy, 
function.  Let  i“l,  2,  ...,be  the  message  arrival  times 
and  1*1,  2,  . . . , be  the  times  at  which  the  Intermediate 
processor  sends  the  messages  on  their  way.  Since  transmis- 
sion is  assumed  to  be  instantaneous,  is  also  the  time 
the  ith  message  arrives  at  the  receiver.  Let  be 

the  delay  for  the  ith  message.  The  Poisson  arrival  assump- 
tion implies  that  the  interarrival  times,  i»l,  2,..., 

are  Independent  that  and  each  has  a probability  density 

P (t)-oe"»^. 

^i 

For  any  given  network,  for  any  given  scheme  for  trans- 
mitting messages  across  that  network,  and  for  any  given  N, 
there  is  a joint  probability  distribution  on 

X**-(Xj,  Xj,  ...,  X^)  and  Y*^»(Yj^,  Yj,  ...,  Yjj).  This  joint 
distribution  must  satisfy  certain  constraints.  The  marginal 
distribution  on  3^  must  be  consistent  with  the  Poisson 
assumption.  Also,  the  delay  must  be  non-negative  with 
probability  one  for  each  i.  Pjj(X**?Y*^)  defines  a mutual 

information  (X**;Y**)  between  and  Y®*.  (X**;Y**)  gives 

" N " 

the  information  about  the  arrival  times  X that  the  receiver 

learns  just  by  receiving  the  messages  at  times  Y^.  This 

information  is  sent  along  with  the  messages  whether  the 


4 
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network  wants  to  send  it  or  not!  Therefore,  this  information 
serves  as  a lower  bound  to  the  amount  of  starting  time 
protocol  information  necessarily  sent.  Note  that  we  are 
not  saying  that  this  information  is  necessarily  in  a usable 
form  or  that  the  receiver  desires  to  have  it,  but  only  that 
it  must  be  sent  whether  we  want  to  send  it  or  not.  (In  the 
message  switching  network  of  the  previous  chapter,  we  need 
not  explicitly  send  message  starting  time  information,  but 
such  information  is  present  nonetheless  under  the  guise  of 
addressing  information.) 

Pjj(X  ;Y  ) also  establishes  another  quzmtity  of  interest, 
the  expected  delay  per  message,  This  qucmtity  is  defined  by 

B U) 

where  E(D^)  is  the  expected  value  of  with  respect 

to  the  probability  distribution  Pjj(X**;y**)  . Let  PjjCd)  be  the 

class  of  probability  distribution  Pjj(X^;Y*^)  which  satisfy 

the  above  constraints  and  have  Djj<d.  We  want  to  find  the 

minimum  protocol  information  in  a network  with  an  expected 

delay  per  message  less  than  or  equal  to  d,  so  we  minimize 

I„  (X*^;Y**)  over  P„(X**;Y**)  e P„(d).  This  is  the  classical 
Pj,  N N 

rate-distortion  problem.  Let 

inf 

R^(d)  = Pjj(X^;Y*^)cPj^(d)  ^ Ip  (X**;Y^)  (2) 

N 

be  the  Nth  order  rate-distortion  function.  The  lower  bound 
we  are  looking  for  then  is  the  rate-distortion  function 

R(d)  = <3) 

: 1 
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Unfortunately  we  have  not  been  able  to  compute  R(d) . 
Gallager  proved  the  following  theorem  which  provides  a 
lower  bound  to  R(d). 

"Gallager *s  Lower  Bound:  Rj^(d)  (as  given  by  (2)  with  N*l) , 
is  a lower  bound  to  R^(d),  for  all  N > 1,  to  R(d) , and  to 
the  average  protocol  Information  per  message  about  message 
arrival  times  between  a source-receiver  pair  for  Poisson  message 
arrivals  of  rate  a auid  expected  delay  d.  Furthermore,  Rj|^(d) 
is  given  by 

Rj^(d)  • -logjd-e”^^)  bits/message.  (4) 

The  probability  sieasure  that  achieves  R^(d}  is  defined 
Implicitly  by 

* max(Xj^,d)  + Z (5) 


where  Z is  a non-negative  random  varisdsle,  independent  of 
with  probability  density  PgCz)  *»  (o+p)exp(-{a+p)z) 
where  p is  given  by 


P 


1-e 


ad 


(6)" 


Notice  that  Gallager' s bound  R^(d)  on  R(d)  is  a 
function  of  ad.  A little  thought  reveals  that  R(d)  itself 
will  be  a function  of  ad. 

Gallager  introduced  two  strategies  for  sending  messages 
which,  when  there  are  a large  number  of  sources  at  each 
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node,  seem  to  be  close  to  optimal.  Strategy  1,  which  is 
better  for  small  values  of  ad,  yields  the  following  upper 
bound  for  attainable  protocol  information  per  message. 


I^^(d) 


< — bits/message. 

2ad 


where  = ? e"^“^(2ad)”  log  n.'e^®*^ 

n=0 


(7) 


(8) 


n! 


(2ad) 


n 


Strategy  2,  which  is  better  for  large  ad  yields; 


^^^2N(N+4/3) ^ 


(9) 


N > 2 


where  N is  chosen  by 


ad  < tN  - 


5N-3  , 

ra+is  J 


(10) 


Rj^(d),  l^^(d),  and  I^^(d)  are  plotted  in  Figure  2. 

S2 

The  behavior  of  I (d) , together  with  some  other 

information,  led  Gallager  to  conjecture  that,  for  large  od, 

R(d)  goes  to  zero  as  (ad)”  instead  of  as  e”  lilce  Rj^(d) . 

In  the  next  section,  we  find  an  improved  lower  bound  to  R(d) 

-2 

which  does  indeed  approach  zero  as  (ad)  for  large  ad. 
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Chapter  IV 
The  New  Lower  Bound 

Our  attack  on  the  problem  will  be  different  from  that 
of  (Gallager,  1976)  in  that  we  will  transform  the  system 
into  a discrete-time  system  by  seunpling,  use  an  information- 
theoretic  approach  similar  to  that  used  by  (Hvimblet,  1978) 
on  the  fully  discrete-time  problem,  and  then  return  to  the 
original  continuous  problem  by  letting  the  sampling  period 
go  to  zero. 

To  turn  the  Poisson  source  into  a discrete  source, 
we  make  use  of  the  fact  that,  at  any  time  t,  the  probed>i7.ity 
of  a message  arrival  in  the  next  A seconds  is  independent 

I 

j 

of  what  happened  before  t.  If  the  Poisson  source  has 

parameter  a and  if  A is  small,  the  probability  of  ein  arrival 

in  any  time  period  of  length  A is  aA  while  the  probeibility 

of  two  or  more  arrivals  in  that  period  is  negligible  so  ! ; 

the  probeibility  of  no  arrivals  is  1 - oA.  Thus,  if  a Poisson 

j ' 

source  is  checked  every  A seconds  (with  A small) , the 
source  can  be  viewed  as  a discrete-time  source  which  at 
each  time  instant  emits  a one  (there  is  a message  available) 
with  probability  aA  and  a zero  (there  is  no  message  avail- 
ed}le)  with  probability  1 - aA.  The  source  output  is  thus 
a binary  sequence  •••)  where  x^^“l  indicates  a 

message  arrival  in  the  time  interval  between  (n-l)A  and  nA. 

K 


f 


1 


We  drop  hereafter  the  A from  our  notation.  We  further 
Introduce  the  notation  « (x^,  *i+l'  **• 

The  message  arriving  each  time  instant  from  the  source 

• • 

will  be  placed  in  a queue  of  unbounded  length.  At  each 
time  instant,  the  processor  serving  the  queue  will  decide 
either  to  send  a message  (represented  by  = 1)  from  the 
[ queue  or  not  to  send  a message  (represented  by  y^  = 0)  in 

queue.  When  the  decision  is  to  send  a message,  the  processor 
transmits  the  first  message  in  the  queue. 

The  number  of  messages  in  the  queue  at  the  end  of 
time  Interval  1,  i.e.,  after  a message  has  been  placed  in 
the  queue  if  ■ 1 and  after  a message  has  been  removed 
from  the  queue  and  sent  if  yj^  * 1,  will  be  called  the  state 
of  the  queue  at  time  lA  and  denoted  by  We  have  the 

following  formula  for  updating  the  state: 

®i-l  *i  ” ^i  “ ®i 

We  also  have  the  constraint 

> 0 (12) 

which  states  the  obvious  fact  that  no  message  can  be  sent 
before  it  has  arrived  from  the  source. 

The  discrete-time  constraint  that  results  from  the 
continuous-time  average  delay  constraint  is  a constraint 


li 
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upon  the  average  length  of  time  W that  each  message  can 
spend  in  the  queue.  This  constraint  proves  to  be  awkward 
when  used  directly.  Thus,  we  invoke  Little's  formula 
(Kleinrock,  1975)  to  transform  it  into  a constraint  on  the 
average  number  of  one's  allowed  in  the  queue.  Little's 
formula  asserts  that  (under  very  weak  conditions  which  are 
satisfied  in  our  system) 


L = XW 


where  L is  the  average  number  of  messages  in  the  queue, 

X is  the  arrival  rate  of  the  messages  and  W is  the  average 
length  of  time  each  message  spends  in  the  queue.  Relating 
the  quantities  to  our  problem  we  have: 

X = number  of  messages  arriving  per  time  interval  = aL 
W = average  number  of  time  intervals  each  message 
waits  = d/A 

Then  L = average  number  of  messages  in  the  queue  = XW 
_ aAd 


= ad 


Thus,  the  discrete-time  problem  becomes: 


minimize  J KX^;  Y^'ls^) 

N -*•  <»  N 1 1 ' O 


(13) 
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I 

1 

N 

where  Pr(Xj  » k)  = = k^)  (14) 

and  where  Pr(x£  ■ 0)  ”1  - aA  and  Pr(Xj^  * 1)  = aA,  subject 
to  the  constraints 

^ 0,  for  all  i (12) 

I E[S.  - ad]  <0  (15) 

**  j»l  J 


and 

- . (16)  ' 

i 

I 

for  each  ±,  n,  and  x,  0<r^n.  I 

Equation  (16)  is  a causality*-type  constraint  which 
states  implicitly  that  the  decisions  made  by  the  processor 

I 

to  determine  the  y^^  within  a block  of  length  n may  not 

depend  on  for  any  j which  is  larger  than  the  largest  i | 

in  the  block.  This  allows  the  processor  to  "look  ahead" 

but  only  up  to  n time  intervals  before  deciding  what  the 

first  in  the  block  should  be.  This  constraint  is  weaker 

than  a full  causality  constraint  would  be  and  thus  using  it 

will  provide  us  with  a lower  bound  to  the  actual  rate- 

distortion  problem  as  it  should  be  posed. 

I 
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Before  beginning  our  derivation  of  the  lower  bound  we 
introduce  one  more  item  of  notation.  The  H^unming  weight 
function  of  a binary  vector  is  the  number  of  ones  in  the 
vector,  and  we  write  W(X?)  for  the  Hamming  weight  of  X^. 

The  expression  (15) , 

implies  that,  for  every  n,  there  is  a r,  0<r<n  such  that 

- aa)I  <0  (17) 

for,  if  the  left  side  of  (17)  were  greater  than  zero  for  all 
r,  0<  r^n,  then  the  left  side  of  (15)  would  also  be  greater 
than  zero.  We  can  now  establish 

Theorem  1;  If  is  a non-negative,  integer-valued  remdom 
variable  with  E(Sq)<®  and  if'{x^,  yj^},i  = 1,  2,  ...,  is 
a random  sequence  (possibly  dependent  on  S^) , satisfying 
(12),  (15)  and  (16),  then 


lim  inf  l/yN. 

(18) 

inf  . 

> feasible  distributions  ^ I(X;  y|S) 

^ on  (X,  Y,  S) 


where  X and  Y are  binary  random  vectors  of  length  n,  S is 
a non-negative  random  variable,  and  where  a feasible 
distribution  is  defined  as  some  joint  distribution 

k^) , S * i) 


Pr  (X  = ( j 22*  • • * ' 


jj^)  , Y = (kj^,  ^2,  . . . » 
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satisfying 


n 

n 


i.  Pr(X  ■ k)  ■ Pr(Xj^  = kj^)  where 

Pr(Xj^  *0)  ■ 1 - aA  and  Pr(x^  ® 1)  « aA; 

. ii.  E(S)<od; 

and  iii.  E[W{X)  - W(Y)  + S]^od. 

Proof : The  proof  of  the  theorem  is  accomplished  by  a chain 
of  inequalities  which  we  will  first  present  together  emd 
then  explain. one  at  a time. 

® 

1 T/v**.  v**l«  1 r T/V 

S KXi?  m « nur  Jq 


lim  inf  1 
N -»•  « 


in+r+n- 1 , „in+r +n- 1 i 
in+r  ' ^in+r  ' 


- „in+r-l  ^in+r-l. 
®o  *1  ^1  ' 


(This  Is  true  for  all  n,  n > 1 and  all  r,  0 < r ^n) 


lim  inf  1 ? „,^n+r+n-li>  „ in+r-1  in+j.-l 

m -►  « nm  “'*in+r  '®o  *1  ^1 


„*„in+r+n-l|„in+r+n-l  „ „in+r-l  „in+r-l. 
- lMn+,.  *1  ' 


) 


in+r  '*in+r 

Q ^ m 

lim  inf  1 in+r+n-1 

w®  4 ft  "'"in+r  ‘‘’in+r-l 

l.“U  n 


> 

- m^- 


S. ,) 


- H{X 


in+r+n-1  i„in+r+n-l 


in+V 


in+r 


S ' ) 

in+r-l' 


_ lim  inf  1^1  -/vi^'+r+n-l  vin+r+n-li-  » 

m • n^fpS^^^in+r  '*in+r  l®in+r-l^ 

> — 1 (X;  y|S)  where  K,  V and  S will  be  defined  in  the  sequel, 

® inf  - 

> feasible  distributions  — l(  X;  y(S) 

on  (X,  Y,  S)  " 


i 
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Inequality  1 


lim  inf  1 -r/yN;  yNiQ  \ > lim  inf  1 ? ^ .^in+r+n-^l  „in+r+n-l| 

N » N - m » -HHT  ..L  ^^*in+r  ' ^In+r  * 


In+r 


„ „in+r-l  „in+r-l» 
o'  *1  * 


Let  xl[,  partition  x”  in  any  way  and  Y^,  partition 


in  the  same  way.  Then 


KX^  yJ  y“^j|s„)  . h(yJ,  - H(yJ  y|;^^|xJ  s„) 


= h(y|;^i|v*  s„)  + h(y![|s„)  - h(y»^^|yJ  xJ  s„)  - HdfJ  |xj:^^^s„) 


> B(yJ|Sjj)  - BaJlxJ  Sg)  +H(y“^^|x![  ^ S^) 


-bIyN^iIyJ  xJ  s^) 


,k.  „ki 


.N  . „N 


= I(X^;  Y,|S^)  + Y“^^  | X 


k ykj 
1 


As  k is  arbitrary,  we  can  apply  this  argviment  repeatedly  to  get 


i-  I ( X^  • Y^  Is)  > i 

N 1'  1^  o — r + mn  + (N-mn-r  ] 


ri(x^*  Y^  I s ) + r T ^ I Q yin+r~l  yin+r—1. 
[I(X^,  ^iISq)  + ^'^in+r  ' ^in+r  ' o *1  ^ 


’o  "1 


+ I(x”  . yN  Is  v^^+r-l  mn+r-1 
^ ^'^mn+r'  ^mn+r'Vl  ^1 


where  m is  greatest  integer  equal  to  or  less  than  — . 


Taking  the  limits  of  this  expression  as  N ->■  «»,  and  m • 


r 
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noting  that  the  first  and  lastterms  of  the  right  side  of  it  are 
finite  and  that  (N  - nm  - r)  n,  yields  (19)  . 

Inequality  2 


lim  inf  1 
m • nm 


? „ -„in+r+n-l I „ yin+r-l  „in+r-l. 

i‘o  ”'*in+r  1^0*1  n ' 


in+r+n-1 ■ „in+r+n-l 


in+r 


So 


in+r-l  „in+r-l 


) 


. lim  inf  1 y ^in+r+n-1 i - » „/yin+r+n-l| 

- m--  ISr  "'*in+r  '^in+r-l^  " ”'^in+r  I 

yin+r+n-l  - . 

*in+r  **in+r-l^ 

in+r-1  in+r  1 

*^in+r-l  determined  from  S^,  , and  yJ  , so 

the  inequality  on  the  second  term  on  each  side  of  (20) 
follows  from  the  fact  that  removing  conditioning  cannot 
decrease  entropy.  The  inequality  on  the  first  term  uses 


(20) 


the  partial  causality  constraint  (16),  i.e.,  the  fact  that 
,in+r+n-l  „ ^i.n+r-1  „in+r-l 


is  independent  of 


"in+r 
Inequality  3 

lim  inf  1 y 1 -,„in+r+n-l  ^in+r+n-liQ  » 

m -•.  « n m ^'*in+r  ' ^in+r  '^in+r-l' 

(21) 

with  Y and  S still  to  be  defined.  We  know  that  the 

lim  infonthe  left  side  of  (21)  exists  because  the  series  for 
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each  m is  bounded  by  H(Xj^)  above  and  by  zero  below.  This 
implies  that  there  is  a subsequence  which  converges  to  the 


limit  of  the  original  sequence.  Let  , v=  1,  2,  ...,  be 


that  convergent  subsequence;  i.e.. 


m. 


_ 1 _ 1 .„in+r+n-l.  „in+r+n-l|„  . , . 

ri  ^^^in+r  ' ^in+r  l^in+r-l^ 


i=0  “v 


™ = left  side  of  (21).  For  each  j,  let 

v->«*  m^ 


Pr(Xj^  = X)  I v = l,  2,  ...»  be  a sequence  such  that 


m 


because  the  Xj^  are  independent  and  identically  distributed. 


Let  “ lil^m  = 2^  » v = 1 , 2,  . . . , be  a sequence  for 


®v  'v 

each  k and  2 such  that 


m 


Pr(y„  = k|X„  = j)  “ ^ ^ 

m — 'm  .-.m  i 


m. 


i*0  "v 


in+r+n-1 

in+r 


. I„in+r+n-l  .. 
-'*in+r 


Finally,  let  =k),v  = l,  2,  ...,  be 


be  a sequence  for  each  t,  2r  snd  k combination  such  that 


m 


Pr(S  = ZlK  = ±,  y„  = k)  = Z , = 1)  where 

“v  "‘v  "*v  “ i=0  ™v  in+r-1 


the  dependence  on  X and  is  implicit  through  the  state 

"'v  V 


update  equation  (11) . 

We  now  consider  the  sequence  of  probabilities 


t.! 


1 


( 

f 


Pr(V'  = k|x„  = i)  Pr(x„  = i)  for  v = 1,  2,  ...  . This 
"v  "*v  "'v 

sequence  is  bounded  between  zero  and  one,  so  there  exists 

at  least  one  subsequence  which  converges.  Let 

Pr(5  = 1 1 X.  - it  V = k)  be  such  a subsequence. 

"‘v  “v  "v 

WWW 

Note  now  that  not  only  does  p converge,  but  it  must  also 

vw 

converge  to  a probability  distribution.  Let  y_  and  be 
random  vectors  of  length  n,  such  that,  for  each  (j_,  k,  1) 
combination. 


Pr<X  = 1,  y " k,  S = X)  = iii;  = i,  \ ' 

V V v 

WWW 

since  C3^  is  a convergent  sequence  in  v,  every  sub- 
sequence, in  particular  Q , converges  to  the  seune  limit 

"'v 

w 

lim 

yico  is  also  equal  to  the  left  side  of  (21)  . 

V 

w 

We  may  now  use  the  fact  that  the  mutual  information 
is  convex  with  respect  to  the  transition  probabilities 
(Gallager,  1968)  to  get  (21)  with  V,  and  5 as  defined 
above . 

Inequality  4 


1 1 
- Ki;  iris)  > feasible  distribution  - 

on  vx,  Y,  s)  — — ' 


(22) 


To  show  that  (22)  is  true,  we  need  only  show  that  the 
distribution  Pr (X , V,  S)  as  given  in  the  section  on 
Inequality  3 satisfies  the  three  conditions  for  a feasible 
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distribution  given  in  the  theorem  statement. 


Condition  i:  Pr(X  = k)  = 


n 

n 

i=0 


Pr(x^  = k^)  where  Pr{x^  = 0) 


= 1 - aA  and  Pr(x^  =1)  = aA.  This  is  satisfied  trivially 
by  the  definition  of  X. 

Condition  ii:  E(S)  <^ad: 

We  first  choose  p = r + 1 where  r,  0<r^nis  such  that 
(17)  is  satisfied,  i.e.,  such  that 

• lA 

- aa))<o 

But,  since  the  lim  sup  of  a subsequence  is  at  most  equal  to 
the  lim  sup  of  the  original  sequence, 

w 

(23) 


m 


t[  ®"P(  Z ''  i S 
i=0 


,)  - 


in+p-1'  m 


" - } < 0 


'w 


w 


->  EtS  - ad]  <0 
-►  E(  S]  < ad 

Condition  iii:  E(W(X)  - W(V')  + SI<ad. 

We  start  with  (23) , change  the  index  to  j = i+1  and  divide 

m to  get 
^w 
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n». 


w 


lim  sup 

W ->•  oo 

Since,  in  the  limit,  the  separated  term  goes  to  zero,  we  have 


i=0v'^  V V 

w w w 


m 

V 

lim  sup  gj  j.  1 . ^ 0 

w -*•  * ^.Q  m ]n+p+n-l  m - 

w w 


(24) 


By  using  the  state  update  equation  (11)  repeatedly,  we  also 
have 


^jn+p-1  ^ ^'^^jn+p  ^ 3n+p  ^ ~ ® 

Substituting  this  into  (24)  gives 


jn+p+n-1’ 


m. 


lim  sup  gr  7 i /c  j. 

W - - iti  '®jn+p-l  ^ "'’‘jn+p  ’ 

■*  w 


lio 


"V 


w 


lim  sup  r 1 -1  • . 1 

w ■>  00^  s s.  , , + 1 W(X^”!P''"'-^) 

3=0  3n+p-l  m ]n+p 

w 


-=  w(Yj2tr"’^)  - ^ Ko 

“ jj,  _ 


w 


So  E[S+  W(X)  - W(y)  - ad]<  0. 

At  this  point  we  have  shown  ^ I(X^;  Y^ls^) 

inf  . 

^ feasible  distributions  I (X;  ^25) 

on  (X,  Y,  S) 


Since  (25)  is  true  for  all  n,  we  can  talce  the  maximum 


of  the  right  side  over  n and  obtain 
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I 


S feasible  distributions  - ICX;  y|S1  tlSi 

” ^ ° ” on  {X,  Y,  S)  n - - 

which  proves  the  theorem. 

Hereafter  we  will  abbreviate  "feasible  distributions 
on  (X,  Y,  S)"  to  "feas.  dist." 

We  now  present  a lemma  due  to  (Humblet,  1978)/which  will 

0 

enable  us  to  underbound  the  right  side  of  (18) . 

Lemma  1;  If  X and  Y are  random  vectors  with  some  joint 
probability  measure  and  thus  some  mutual  information  I(Xj  Y) 
and  if  f{*)  and  g(*)  are  deterministic  fvmctions,  then 

I(X;  Y)  > I(f  (X)  ;g(Y))  (26) 

Proof:  I(X;  Y)  = H(X)  - H(x|y)  > H(X)  - H(x|g(Y)) 

* I(X;  g(Y))  = H(g(Y))  - H(g(Y)|x)  > H(g(Y))  - H(g(Y)|f(X)) 

= I(f  (X)  ; g(Y))  . 

Applying  (26)  to  (18), 

lim  inf  j/x^.  y^ls  ) > n*ax  inf  i I(X*  yIs)  M 

N -*■  » ^^*1'  ^I'^o^  - n feas.  dist.  n 

i 

i T . K I I 

- T feastilst.  |<«<W<«|S)  - H(W(X)|W(Y),  S) 

= T feast"dlat.  ’ “‘s'”  <“> 

where  S'  = W(X)  - W(Y)  + S. 

I 


36 


Equation  (28)  is  true  because,  given  W(Y)  and  S,  H(X) 
and  s'  are  identically  distributed  and  so  they  have  the  seune 
entropy.  In  addition. 


max  inf  i[H(W(X))  H{S')] 

n feas.  dist. 


^ T ffeas!”Iist,  k - feas'Iist.  K 


To  overbound  the  last  term  of  the  right  side  of  (29) 
and  thus  underbound  the  whole  expression,  we  use  the  fact 
that  S’  is  constrained  by  E(S')  <ad  and  that  S'  is  non- 
negative. We  then  maximize  the  entropy  of  S'  given  these 
constraints.  The  result  must  be  at  least  equal  to  the  sup 
over  feasible  distributions. 

We  now  make  use  of  a well-known  result,  (Gallager,  1968)  , 
but  give  the  proof  as  it  is  not  explicitly  available  in  the 
literature. 

Leu-’na  2;  If  S'*  is  a non-negative  random  variable  with 
Pr(S'  = k)  = Pt  and  if  E(S')  < ad,  then 


H(S')  < (1  f ad)  WCp^)  (30) 

1 

where  H(.)  is  the  binary  entropy  function. 

Proof ; We  note  that  the  maximum  will  occur  when  E(S')  = ad 

so  we  wish  to  find  max  I - P.  In  Pi,  = -min  E P.  In  p, 

k=0  K K k k 

subject  to  ? Pi,=lr  S kp  = ad,  and  P,  > 0 for  all  k.  We 
k=0  ^ k=0  ^ 
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kL 


temporarily  ignore  the  inequality  constraint  and  form  the 
Langrangian : 


00  ^ M 

L(p,  X,  y)  = E Pv  In  p + X(  E p^  - 1)  + y ( E k p.  - od) 
k=0  ^ k=0  ^ k=0  ^ 


0 = 


dL 


- 1 + In  + X + yi 


r,  _ ^ 

Pi  e 


(31) 


so  all  > 0 and  our  inequality  constraint  will  be  satisfied. 


0 = ^ = ? Pk  - 1 ^ + X + ky)  _ ^ 


k=l 


k=l 


? = el+^ 

k=l 


1 1+X 

= e 


(32) 


1 - e 


-V 


0 = ? k p.  - ad  - f k ^ - “<3 


k=0 


g-(l+X)  ^ k(e“'‘)’'=  ad 

k=0 


e 


-y 


TTTI^ 


= ad 


(33) 


Substituting  (32)  into  (33)  yields 

«-y  1 


d-e"^)^ 


(1-e-^) 


ad 
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-u  ad 

® 1+ad 


w 


In 


, 1+ad 


e 


1+X 


1 - 


ad 

1+ad 


1+ad 


Thus, 


_ 1 . ad  . k 

Pk  1+ad  4+ad^ 


and  H(S) 


? 

k=0 


1 

T+^ 


ad  . k 

1+ad  [In (1+ad) 


+ k In 


1+ad, 
ad  ^ 


= In  (1+ad)  + ad  ln{^i^) 


nats 


= (1+ad)  H( 


— ± — ) 
1+ad' 


bits. 


(34) 


We  are  now  left  with  the  problem  of  underbounding 


inf 

feas.  dist. 


H(W(X)). 


Note  that  the  infimum  is  meaningless 


since  the  feasibility  constraints  determine  the  distribution 


on  X completely  and  therefore,  they  also  determine  the 
distribution  on  W(X).  But,  since  nothing  is  gained  by 
trying  to  find  H(W(X))  in  the  discrete-time  setting  rather 
than  the  continuous-time  setting,  we  will  apply  our  limit- 


ing operation  on  the  sampling  time  interval  and  return  to 


the  continuous- time  problem. 

To  this  point,  we  have  shown  that,  for  any  A, 
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• lim  inf  1 -r/yNA 

N ^ CO  N 

i "‘n^  k t«(W(X))  - ln(l+ad)  - ad  ln(i^)]nats  per  time  (35) 

interval 

where  we  have  returned  A to  our  notation  on  the  left  side 
of  (35)  in  anticipation  of  returning  to  the  continuous- 
time problem,  and  where  W(X)  is  binomially  distributed  with 
pareuneters  p = aA  and  N = n,  i.e. 

Pr(W(X)  = k)  = (JJ)(aA)^  (1  - aA)""^. 

Now  in  order  to  recover  our  original  Poisson  source, 
we  must  let  A -*■  0,  n -»•  » and  N -►  » so  that  nA  t and  NA  T. 

We  first  divide  in  (35)  by  aA  to  get 

lim  inf  1 

NA-^  « NAo  ' a ' ^A  ' 

— ''n*  IH(W(X))  - ln(l+ad)  - ad  In(^^i^)]  nats  per  message 

We  now  let  A-^0,  n-»«,  nA-*-t,  NA-*-T  as  described  cUbove. 
lim  inf  1 t /vT  vTv  ^ >nax  1 ru/n/vt..  , 

X^oid 

- ad  In  ) 1 nats  per  message  (36) 
T 

where  X represents  a Poisson  process  with  parameter  a 

T 

from  time  0 to  T,  Y is  some  other  stochastic  process  from 
time  0 to  T and  W(X^)  is  a Poisson  random  variable  associated 


I 
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with  the  Poisson  process  at  time  t.  is  distributed 

as  follows: 


4-  ~“t 

Pr(W(X^)  = k)  = ® 


k! 


It  is  the  entropy  of  this  random  variable  that  we  must 
lower  bound,  and  we  use  the  properties  of  the  gamma  function 
to  assist  us. 

Lemma  3 : The  gamma  function 


r(x)  = t^“^e~^dt,  X > 1 


(37) 


has  the  properties  that  T (1+x)  »■  xl  when  x is  a non- 
negative integer  and,  for  (x>l)  , ln[r(x)]  is  convex  U in  x. 


Proof;  The  first  property  is  well-known  (See  Feller,  1968) 

2 

To  prove  convexity,  we  show  that  (x)  > q follows: 

dx^ 


00 

d Inr(x)  ^ I an  t)t^^~^^  e~^  dt 


dx 


I' 


^(x-1)  g-t 


dt 


d Inr(x) 
3[x 


= dt][t?  (In  t)^  t(x-l)  ^-t 


(^  e"^  dt)^ 


Defining  f(t)  = (t  e“^)  and  g(t)  = In  t (t  e“*^)  ** 
and  using  the  Schwartz  inequality 


(f  f(x)g(t)dt)^  < f^(t)dt][^'“  g^(t)dt]. 


we  have  that  ^ > 0 , so  the  Gamma  function  is  convex  U, 

dx"' 


l! 
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We  now  return  to  our  entropy  underbounding, 
t ® (cit)^e~^^ 

H(W(X  ))  = E [-nlogcxt  + at  + log  nl] 

n*0 

= -at  log  at  + at  + log  r(n  + l). 

We  now  use  the  Stirling  approximation  (usually  used  for  nl 
but  also  true  for  non-integer  values  of  the  gamma  function) 

(Feller,  1968). 

r(l+x)  > (2Tr)**  e"^ 

So  H(W(X^)  > -at  log  at  + ot  + y log(2Tr) 

+at  log  at  - at  + j log  ot 

(38) 

= ^ In  2ir  + ^ In  at  nats. 

This  lower  bound  together  with  the  true  value  of  the 
entropy  (for  which  there  is  no  convenient  analytical 

expression)  are  plotted  against  In  at  in  Figure  3.  ‘ 

This  plot  is  on  a semilog  scale  with  (X)  representing  , 

the  true  entropy  and  (0)  representing  the  lower  bound.  When 
viewed  as  a function  of  log  at,  the  bound  is  a straight  line 
with  slope  1/2.  Investigation  reveals  that  the  difference  j 

between  H(W(X^))  and  the  lower  bound  reaches  a minimum  of 
approximately  .355  at  at  .504.  In  addition,  when  H(W(X^)) 
is  viewed  as  a function  of  In  at,  its  slope  is  greater  than 
1/2  for  all  at  > .504.  This  means,  that  given  any  at^  >.504, 

I 

- - - . . i 
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we  can  bound  H(W(X^))  for  all  ctt  > at^  by  adding  1/2  In  at 

to  [H(W(X  °))  - I In  at^l.  We  note  that  H(W(X^))  at  at  « 10 

has  a slope  (when  viewed  as  a function  log  at)  of  approxi- 
mately  .509  and  that  * -5  "ith  the  limit 

approached  monotonically  from  above  for  at  > 10.  This 
suggests  that  at  * 10  is  a legimate  bre2dcpoint  for  a 
piecewise  lower  bound  to  H(W(X^))  given  by 


a.  1.274  + 4lnat  nats  for  0<  at  < 10 

HtW(X^))  > { 2 - 

1.410  + ^■Inat  nats  for  at  ^ 10 


(39) 


This  bound  is  denoted  by  a (A)  in  Figure  3. 

To  show  that  this  lower  bound  is  quite  tight,  we  make 
use  of  an  upper  boiind  on  the  entropy  of  an  integer-valued 
random  variable  given  its  variance.  This  bound  is  due  to 
Massey  (private  communication) . 

Lemma  4;  If  S is  an  integer-valued  discrete  random  variable 
with  variance  a , then  its  entropy  satisfies 

H(X)  <_!  log(27r  e(a^  + —)). 

Proof ; We  first  form  a continuous  density  function  from  the 
discrete  distribution  by 

fY(y)  = Pr(X  = X),  ye(x  - 1/2,  x + 1/2). 

The  continuous  entropy  of  the  new  random  variable  Y is 
equal  to  the  discrete  entropy  of  the  original  random  variable 


I 
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X,  since 


■/  fy(y)log  f (y)  dy  = - Z / ^ f (y)log  fY(y)  dy 

00  ^ = *00  i—lj 


- ? Pr(X=i)log  Pr(X=i). 

i=-oo 


Using  Shannon's  bound  for  continuous  random  variables 
(Gallager,  1968)  we  have 


H(X)  = H(Y)  log(27re  a^) 


(40) 


Now  ECX)  = E(Y)  = m.  Moreover, 


= ? Pv(t)  dt 


y k*-»  k-Js 

= ? p.  /’'■^’’(t-m)^  dt 

lc=-oo  ^ k-Jj 

■ k!-.  = “x  ^ A- 

Svibstituting  in  (40)  gives  the  result. 

In  our  case , the  bound  of  Lemma  4 becomes 

H(W(X^))  < j log(2Tre(at  + j^);v  1.42  + jlog(at  + ^)  nats(41) 

showing  that  our  lower  bound  (39)  is  very  tight  for  large 
values  of  at. 


We  summarize  our  results  as 
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Theorem  2 ; If  W(X^)  is  a Poisson  random  variable  with  mean 
at,  then 


. 1.274  + :=•  In  at  nats  0 < at  < 10 

H(W(X^))  >'{  ^ 

1.41  + j in  at  nats  at  ^ 10 


and  H(W(X^))  <1.42  + | In  (at  + nats 


(39) 


(42) 


Using  the  bound  (39)  in 


lim  inf  1 
T -►  « 


^ Kx'*^;  Y*^)  > ^tH(W(X^))  T ln(l+ad) 


1 ^ 

- ad  ln(- ^^-))  nats  per  message 


(36) 


we  obtain  the  bound , 


1 1.274  , 

^ max  1 (j  ] + i In  at 

^ ^1.41  ^ 


(42) 

0 <.at  < 10 

f (ad)  nats  per  message  for  [ j 

at  > 0 


where  f(ad)  = ln(l+ad)  + ad  In  (^^5^) 

ad 


Let  c equal  1.274.  We  can  then  see  that  if  we  maximize 
(42)  with  respect  to  at  the  maximum  occurs  at 

at  = e2(f  (ad)-c)+l 
This  leaves  us  with: 


1 2c 

lim  inf  T.  T 2e 

T ■*.  a>  aT  ^ - ^2f  (ad) 


(43) 


numerator  of  (43) , we  arrive  at 

Theorem  3;  R(d)  as  given  by  Equation  is  lower  bounded  by 


318 

R(d)  ^ ^ nats  per  message  (44) 

(1+ad) 


This  bound  (44)  can  be  improved  for  ad  > 10  by  using 

C = 1.41  instead  of  C = 1.274. 

Note  also  that  for  large  ad  (44)  approaches  zero  as 

— ^ — « . The  bound  given  by  (44)  is  plotted  in  Figure  3 
(ad)^ 

along  with  Gallager's  lower  bound  (4),  and  the  performance 
of  Gallager's  best  strategy  (minimum  of  (7)  and  (9)  which 
provides  an  upper  bound -to  the  actual  rate  distortion 
function  which  is  the  solution  to  the  minimum  protocol 
problem. 


I, 
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Chapter  V 

Discussion  and  Suggestions  for  Further  Research 
Let  us  review  what  we  have  accomplished  at  this  point. 
We  have  found  a lower  bound  (44)  to  the  continuous-time 
version  of  the  problem  constructed  on  page  25,  i.e.,  we 
lower  bounded  the  solution  to  Gallager's  rate-distortion 
problem  (page  18)  with  the  additional  constraint  (corres- 


ponding to  ( 16 ) ) : 


For  every  i,  the  process  is  independent  of  the 

processes  and  the  random  variables  and 

where  represents  the  message  arrival  Poisson  process 
from  time  u to  time  v,  represents  the  stochastic  process 
for  the  message  sending  tiroes  from  time  u to  time  v and 
represent  the  number  of  messages  which  have  arrived 
but  have  not  yet  been  sent  at  time  v. 

As  pointed  out  in  Chapter  IV,  this  added  constraint 
is  a partial  causality  constraint.  If  we  divide  time 
up  into  blocks  of  t seconds,  then  (45)  allows  us  to  observe 
the  X process  for  an  entire  block  before  deciding  when  to 
send  messages  in  that  bloc):.  It  does  not  allow  us  to 
observe  the  X process  for  any  time  past  the  end  of  present 
block  so  the  largest  length  of  time  that  we  are  allowed 
to  look  ahead  is  t seconds. 

Since  we  eventually  would  like  to  find  the  minimum 


protocol  information  required  we  would  really  like  to  solve 


the  following  problem  which  we  shall  call  the  minimum 
protocol  problem; 

Find  R(d)  = 

where  Rj^(d)  = ^ y”) 

where  class  of  probability  measures  which 

satisfy  the  constraints  for  Pjj(d)  given  in  Chapter  II 
(Gallager's  constraints)  and  also  satisfies  the  following 
constraint. 

The  complete  causality  constraint;  Assume  that,  at  any  time 
t,  i messages  have  arrived  from  the  source  and  j messages 
have  been  sent.  If  is  the  time  the  j+1  message  is 

sent,  then  Pr(t£  must  be  independent  of  (Xj^, 

k = i+2,  i+3,  ...)  where  Xj^  is  the  time  the  message 
arrives  from  the  source.  But,  P^^ (t  ^ < t+6)  may 
depend  on  only  through  the  event  t }, 

It  can  be  i‘C-:lily  seen  that  the  complete  causality 
constraint  is  a very  difficult  constraint  to  incorporate 
in  a precise  mathematical  framework.  We  note  that  in  the 
discrete-time  framework  the  constraint  becomes  much  simpler. 
If  equal  one  when  a message  arrives  in  the  i^^  time 
interval  and  zero  if  no  message  arrives  in  the  interval 
and  if  y^^  equals  one  if  a message  is  sent  in  the  i^^  time 
interval  and  zero  if  no  message  is  sent  in  the  interval. 
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then  the  discrete-time  complete  causality  constraint  is: 

must  be  independent  of  {x ^ , j = i+1,  i+2,  and 

i i 

I ^ 

The  fact  that  the  partial  causality  constraint  is  a 
weaker  constraint  them  the  complete  causality  constraint 
means  that  the  solution  to  the  partially  causal  problem  is 
a minimization  over  a larger  set  than  the  solution  to  mini- 
mum protocol  problem  and  thus  the  partially  causal  solution 
is  a lower  bound  to  the  minimtuti  protocol  solution.  We  also 
note  that  Gallager's  optimal  strategy  ((9)  and  (11))  are 
completely  causal  and  they  provide  ein  upper  bound  to  the 
minimum  protocol  solution. 

We  have  achieved  a lower  bound  (44)  to  the  partial 
causality  problem  and  thus  to  the  minimum  protocol  problem. 

As  ad  goes  to  infinity,  this  bound  approaches  zero  as 
_2 

(ad)  . As  can  be  seen  from  figure  3,  there  is  still  a 

gap  between  the  lower  bound  (44)  and^  the  upper  bound  to 

the  minimum  protocol  solution,  i.e.,  the  performance  of 

the  best  attainable  strategy  (9) . The  upper  bound  (9) 

approaches  zero  as  ^ . Due  to  the  fact  that  the  lower 

(ad)^ 


bound  was  found  using  a relaxed  causality  constraint,  we 
conjecture  that  the  upper  bound  is  closer  to  the  actual 
minimum  protocol  solution  than  the  lower  bound. 


I 


In  reviewing  our  derivation  we  see  that  the  first  time 

I that  we  might  violate  a complete  causality  constraint  was 

I 

in  applying  the  Hamming  weight  function  W to  our  X and  Y 
sequence  in  equation  (27) . We  first  note  that  this  step 
is  necessary  to  finding  a bound  on  the  second  term,  so  our 
additional  analysis  could  not  be  applied  unchanged  if  the 

complete  causality  constraint  were  added. 

I 

The  way  that  the  complete  causality  constraint  may  be 
violated  by  applying  the  W function  is  that  we  are  allowed 
to  look  ahead  n time  units  at  the  beginning  of  a block 
before  we  must  pick  our  Y vector  for  this  block.  What 
saves  our  bound  from  triviality  is  that  we  then  maximize 
over  n.  For  n small,  the  inequalities  used  to  find  the 
bound  are  not  very  tight.  As  n increases,  the  inequalities 
get  better  but  eventually  the  effect  of  the  non-rcausalAty 
begins  to  dominate. 

There  are  two  ways  to  approach  this  problem.  One  way 
is  to  start  with  the  results  from  theorem  1 and  try  to 
proceed  without  using  the  Hamming  weight  function.  This 
approach  does  not  appear  to  be  fruitful. 

A second  way  is  to  try  to  extend  Gal lager's  approach. 

It  can  be  shown  that  Rj^(d)  as  given  by  Equation  (2)  provides 
a lower  bound  to  R(d)  for  any  N.  We  have  expended 
considerable  effort  to  no  avail  trying  to  find  R2 
continuous-time  case  using  standard  rate-distortion 
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methods.  These  methods  lead  to  integral  equations  involving 
the  probability  density,  W(Yj^,  '^2^'  have  been  able  to 
prove  that,  unlike  W(y^)  in  the  R^(d)  problem,  W(Yj^,  Y^)  is 
ill-behaved.  It  seems  that  a more  promising  approach  is  to 
try  to  solve  the  Rj (d)  problem  for  the  discrete  case  in 
order  to  obtain  more  insight  before  renewing  the  attack  on 
the  continuous-time  problem.  This  approach  has  the 
advantage  of  allowing  the  use  of  computer  methods  and  it 
also  allows  a simpler  introduction  of  the  complete  causality 
constraint. 
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